AITopics | mean-field regime

Two-layer neural network on infinite dimensional data: global optimization guarantee in the mean-field regime

Neural Information Processing SystemsDec-25-2025, 09:10:56 GMT

Analysis of neural network optimization in the mean-field regime is important as the setting allows for feature learning. Existing theory has been developed mainly for neural networks in finite dimensions, i.e., each neuron has a finite-dimensional parameter. However, the setting of infinite-dimensional input naturally arises in machine learning problems such as nonparametric functional data analysis and graph classification. In this paper, we develop a new mean-field analysis of two-layer neural network in an infinite-dimensional parameter space. We first give a generalization error bound, which shows that the regularized empirical risk minimizer properly generalizes when the data size is sufficiently large, despite the neurons being infinite-dimensional. Next, we present two gradient-based optimization algorithms for infinite-dimensional mean-field networks, by extending the recently developed particle optimization framework to the infinite-dimensional setting. We show that the proposed algorithms converge to the (regularized) global optimal solution, and moreover, their rates of convergence are of polynomial order in the online setting and exponential order in the finite sample setting, respectively. To our knowledge this is the first quantitative global optimization guarantee of neural network on infinite-dimensional input and in the presence of feature learning.

global optimization guarantee, infinite dimensional data, neural network, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.99)

Add feedback

d2155b1f7eb42350d7bc3013eefe5480-Paper-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 04:27:05 GMT

artificial intelligence, machine learning, optimization problem, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Add feedback

Communities in the Kuramoto Model: Dynamics and Detection via Path Signatures

Nguyên, Tâm Johan, Lee, Darrick, Stolz, Bernadette Jana

arXiv.org Machine LearningMar-21-2025

The behavior of multivariate dynamical processes is often governed by underlying structural connections that relate the components of the system. For example, brain activity which is often measured via time series is determined by an underlying structural graph, where nodes represent neurons or brain regions and edges represent cortical connectivity. Existing methods for inferring structural connections from observed dynamics, such as correlation-based or spectral techniques, may fail to fully capture complex relationships in high-dimensional time series in an interpretable way. Here, we propose the use of path signatures--a mathematical framework that encodes geometric and temporal properties of continuous paths--to address this problem. Path signatures provide a reparametrization-invariant characterization of dynamical data and, in particular, can be used to compute the lead matrix which reveals lead-lag phenomena. We showcase our approach on time series from coupled oscillators in the Kuramoto model defined on a stochastic block model graph, termed the Kuramoto stochastic block model (KSBM). Using mean-field theory and Gaussian approximations, we analytically derive reduced models of KSBM dynamics in different temporal regimes and theoretically characterize the lead matrix in these settings. Leveraging these insights, we propose a novel signature-based community detection algorithm, achieving exact recovery of structural communities from observed time series in multiple KSBM instances. Our results demonstrate that path signatures provide a novel perspective on analyzing complex neural data and other high-dimensional systems, explicitly exploiting temporal functional relationships to infer underlying structure.

data mining, ksbm, machine learning, (19 more...)

arXiv.org Machine Learning

2503.17546

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > United Kingdom > Scotland (0.04)
(3 more...)

Genre: Research Report > New Finding (0.86)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (0.93)
Information Technology > Data Science > Data Mining (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Two-layer neural network on infinite dimensional data: global optimization guarantee in the mean-field regime

Neural Information Processing SystemsFeb-10-2025, 13:00:12 GMT

Analysis of neural network optimization in the mean-field regime is important as the setting allows for feature learning. Existing theory has been developed mainly for neural networks in finite dimensions, i.e., each neuron has a finite-dimensional parameter. However, the setting of infinite-dimensional input naturally arises in machine learning problems such as nonparametric functional data analysis and graph classification. In this paper, we develop a new mean-field analysis of two-layer neural network in an infinite-dimensional parameter space. We first give a generalization error bound, which shows that the regularized empirical risk minimizer properly generalizes when the data size is sufficiently large, despite the neurons being infinite-dimensional. Next, we present two gradient-based optimization algorithms for infinite-dimensional mean-field networks, by extending the recently developed particle optimization framework to the infinite-dimensional setting. We show that the proposed algorithms converge to the (regularized) global optimal solution, and moreover, their rates of convergence are of polynomial order in the online setting and exponential order in the finite sample setting, respectively.

global optimization guarantee, mean-field regime, neural network, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Emergence of Globally Attracting Fixed Points in Deep Neural Networks With Nonlinear Activations

Joudaki, Amir, Hofmann, Thomas

arXiv.org Machine LearningOct-29-2024

Understanding how neural networks transform input data across layers is fundamental to unraveling their learning and generalization capabilities. Although prior work has used insights from kernel methods to study neural networks, a global analysis of how the similarity between hidden representations evolves across layers remains underexplored. In this paper, we introduce a theoretical framework for the evolution of the kernel sequence, which measures the similarity between the hidden representation for two different inputs. Operating under the mean-field regime, we show that the kernel sequence evolves deterministically via a kernel map, which only depends on the activation function. By expanding activation using Hermite polynomials and using their algebraic properties, we derive an explicit form for kernel map and fully characterize its fixed points. Our analysis reveals that for nonlinear activations, the kernel sequence converges globally to a unique fixed point, which can correspond to orthogonal or similar representations depending on the activation and network architecture. We further extend our results to networks with residual connections and normalization layers, demonstrating similar convergence behaviors. This work provides new insights into the implicit biases of deep neural networks and how architectural choices influence the evolution of representations across layers.

activation, activation function, convergence, (14 more...)

arXiv.org Machine Learning

2410.20107

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Understanding Transfer Learning via Mean-field Analysis

Aminian, Gholamali, Szpruch, Łukasz, Cohen, Samuel N.

arXiv.org Machine LearningOct-23-2024

We propose a novel framework for exploring generalization errors of transfer learning through the lens of differential calculus on the space of probability measures. In particular, we consider two main transfer learning scenarios, $\alpha$-ERM and fine-tuning with the KL-regularized empirical risk minimization and establish generic conditions under which the generalization error and the population risk convergence rates for these scenarios are studied. Based on our theoretical results, we show the benefits of transfer learning with a one-hidden-layer neural network in the mean-field regime under some suitable integrability and regularity assumptions on the loss and activation functions.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Machine Learning

2410.17128

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Global Optimality of Elman-type RNN in the Mean-Field Regime

Agazzi, Andrea, Lu, Jianfeng, Mukherjee, Sayan

arXiv.org Artificial IntelligenceMar-12-2023

We analyze Elman-type Recurrent Reural Networks (RNNs) and their training in the mean-field regime. Specifically, we show convergence of gradient descent training dynamics of the RNN to the corresponding mean-field formulation in the large width limit. We also show that the fixed points of the limiting infinite-width dynamics are globally optimal, under some assumptions on the initialization of the weights. Our results establish optimality for feature-learning with wide RNNs in the mean-field regime

artificial intelligence, machine learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2303.06726

Country:

North America > United States > North Carolina (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mean-Field Analysis of Two-Layer Neural Networks: Global Optimality with Linear Convergence Rates

Zhang, Jingwei, Huang, Xunpeng, Yu, Jincheng

arXiv.org Artificial IntelligenceOct-17-2022

Gradient-based optimization is a fundamental tool in machine learning and has witnessed great empirical success for training neural networks, despite the highly non-convexity landscape of the objective. However, theoretical understanding of nonconvex optimization in neural networks is quite limited. Until recently, there has been much work that explains the success of gradientbased optimization in overparametrized neural networks, that is neural networks with massive hidden units. Under the overparametrization condition, the learning problem can be translated into minimizing a convex functional and hence circumventing the difficulties of analyzing non-convex objectives. It's worth mentioning that there has been much broader interest in analyzing the convergence of machine learning algorithms by formulating it as the problem of minimizing some (usually convex) functional of a measure, such as variational inference (Liu and Wang (2016); Liu (2017); Chewi et al. (2020)), generative adversatial networks (Johnson and Zhang (2019); Nitanda and Suzuki (2020)) and learning infinite-width neural networks (Chizat and Bach (2018); Mei et al. (2018); Nguyen and Pham (2020); Fang et al. (2021)). The key idea is by approximating the learning dynamics of model parameters by the optimization on the space of probability of measures over the model parameters under the overparametrization condition.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Artificial Intelligence

2205.0986

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Convergence of policy gradient for entropy regularized MDPs with neural network approximation in the mean-field regime

Kerimkulov, Bekzhan, Leahy, James-Michael, Šiška, David, Szpruch, Lukasz

arXiv.org Machine LearningJan-18-2022

We study the global convergence of policy gradient for infinite-horizon, continuous state and action space, entropy-regularized Markov decision processes (MDPs). We consider a softmax policy with (one-hidden layer) neural network approximation in a mean-field regime. Additional entropic regularization in the associated mean-field probability measure is added, and the corresponding gradient flow is studied in the 2-Wasserstein metric. We show that the objective function is increasing along the gradient flow. Further, we prove that if the regularization in terms of the mean-field measure is sufficient, the gradient flow converges exponentially fast to the unique stationary solution, which is the unique maximizer of the regularized MDP objective. Lastly, we study the sensitivity of the value function along the gradient flow with respect to regularization parameters and the initial condition. Our results rely on the careful analysis of non-linear Fokker--Planck--Kolmogorov equation and extend the pioneering work of Mei et al. 2020 and Agarwal et al. 2020, which quantify the global convergence rate of policy gradient for entropy-regularized MDPs in the tabular setting.

equation, gradient flow, theorem 3, (15 more...)

arXiv.org Machine Learning

2201.07296

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Collaborating Authors

mean-field regime

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Two-layer neural network on infinite dimensional data: global optimization guarantee in the mean-field regime

d2155b1f7eb42350d7bc3013eefe5480-Paper-Conference.pdf

Communities in the Kuramoto Model: Dynamics and Detection via Path Signatures

Two-layer neural network on infinite dimensional data: global optimization guarantee in the mean-field regime

Emergence of Globally Attracting Fixed Points in Deep Neural Networks With Nonlinear Activations

Understanding Transfer Learning via Mean-field Analysis

Global Optimality of Elman-type RNN in the Mean-Field Regime

Mean-Field Analysis of Two-Layer Neural Networks: Global Optimality with Linear Convergence Rates

Convergence of policy gradient for entropy regularized MDPs with neural network approximation in the mean-field regime